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Abstract 



The commutative ambiguity cambcx of a context-free grammar G with start symbol X 
assigns to each Parikh vector v the number of distinct leftmost derivations yielding a word 
with Parikh vector v. Based on the results on the generalization of Newton's method to 
cj-continuous semirings EKL07b, EKL07a, EKLIO', we show how to approximate cambcx 
by means of rational formal power series, and give a lower bound on the convergence speed 
of these approximations. From the latter result we deduce that cambcx itself is ratio- 
nal modulo the generalized idempotence identity k = k + 1 (for k some positive integer), 
and, subsequently, that it can be represented as a weighted sum of linear sets. This ex- 
tends Parikh's well-known result that the commutative image of context-free languages is 
semilinear (fc = 1). 

Based on the well-known relationship between context-free grammars and algebraic sys- 
tems over semirings [CS631 [SS781 IBR82UKui97l IBoz99| . our results extend the work by Green 
et al. IGK T07 on the computation of the provenance of Datalog queries over commutative 
u-continuous semirings. 

1 Introduction 

Motivation Recently, Green et al. showed in jGKTOT] that several questions regarding the 
provenance of an answer to a Datalog query reduce to computing the least solution of an 
algebraic system over a w-continuous commutative semiring. To illustrate the main idea, consider 
the following Datalog program that computes the transitive closure of a finite directed graph 



Here, X, Y, Z are variables ranging over the nodes V of the graph, the interpretation of the 
(extensional) predicate edge(X, Y) is given by the edge relation E of Q, while the interpretation 
of the (intensional) predicate trans(X, Y) is implicitly given by the least Herbrand model, i.e. the 
transitive closure of Q. In order to deduce which edges of Q give rise to a positive answer to the 
query ? — trans(w, w)., in |GKT07| the authors assign to each positive literal a unique identifier 

*This work was partially funded by DFG project "Polynomielle Systeme iiber Semiringen: Grundlagen, Algo- 
rithmen, Anwendungen" 

llnstitut fiir Informatik, Technische Universitat Miinchen 
^See e.g. |CGT89l for more details on Datalog. 
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- for instance, let A — {e^.v \ {u, v) € E} and X = {X^^v \ u^v ^ V} - and then expands the 
above query into an abstract algebraic system in the formal parameters A and the variables X: 

u,w - 1^ J2y^v Xu,vXv^w otherwise 

In order to give a meaning to this system, the right-hand side is interpreted over some semiring 
{S, +, •, 0, 1), short S', i.e. the abstract addition and multiplication are interpreted as the addition 
and multiplication in S*, and each formal parameter a G A is interpreted as an element h{a) G S 
by means of a valuation h: A ^ S. As is well-known [Kui97) , each algebraic system has a least 
solution if S is w-continuous (see Section 

We demonstrate the connection between the Datalog program and the algebraic system by means 
of two examples. First, the transitive closure itself is essentially the least solution over the 
Boolean semiring ({0, 1}, V, A, 0, 1) under the valuation h{eu,w) — 1 for all e^.w & A, i.e. the 
least solution assigns 1 to Xu^w if and only if (u, w) is in the transitive closure. For a somewhat 
more interesting example, assume we want to analyze why an edge (u, w) is included in the 
transitive closure. To this end, it suffices to represent a path by the set of its edges, and a 
set of paths by the set of corresponding sets of edges. This leads naturally to the semiring 
(2^ , U, lyj, 0, {0}): a semiring element is a set of subsets of edge identifiers, two semiring elements 
si, S2 are added by taken their union si Us2, while the (commutative) multiplication is defined by 
si Ity S2 = {ai Ua2 I ai G si, a2 G S2}. Again, we obtain the answer to our question by computing 
the least solution of above system over this semiring under the valuation h(eu^w) = {{e^.m}}. 
For further examples, we refer the reader to [GKT QTl. 

Note that in both examples, multiplication is commutative, and addition is idempotent. Nat- 
urally, the question arises over which commutative w-continuous semirings we can compute or, 
at least, approximate the least solution of an algebraic system. Of particular interest is the 
semiring of formal power series whose carrier is the set Noo((N^)) of functions from Parikh vec- 
tors N*^ to the extended natural numbers Noo = N U {00}, as it is free in the following sense: 
every valuation h: A ^ S into a concrete commutative w-continuous semiring induces a unique 
w-continuous homomorphism H : Noo ((A*)) S which maps the least solution over Noo((N'^)) to 
the least solution over S (we do not distinguish between h and H in the following). See e.g. 
[B5S991IUKT07] . 

In general, a finite, explicit representation of the least solution (sx \ X G X) over Noo((N^)) is not 
possible (see also Example 13.51) . In |GKT07j the authors therefore present two algorithms All- 
Trees and Monomial- Coefficient for computing finitely representable information on this solution: 
All-Trees decides whether Sx '■ N"^ — >■ Noo has only finite support and takes only finite values on 
its support, and can be used to evaluate Datalog over finite distributive lattices, a special case of 
commutative cj-continuous semirings; Monomial- Coefficient computes the value of Sx for some 
Parikh vector v G N*^. Both algorithms are based on the close relationship between algebraic 
systems and cont ext-free grammars (CSMl [SS781 lKui971 EBBQTI ITha67[ BR82 Boz99, E KLOTbl 
lEKLOTal IEKL08] , and work by enumerating the derivation trees of the grammar associated with 
the algebraic system utilizing the pumping lemma for context-free languages in order to ensure 
termination. The associated context-free grammar G = {X, A, P) with nonterminals A", alphabet 
A, and productions P is obtained from the algebraic system by reinterpreting the right-hand sides 
of the algebraic system as rewriting rules for the variables. For instance, the algebraic system 
for computing the transitive closure translates to the grammar G defined by the rules 

Xu.w Xu,vXv,w for all u,v,w G V, and Xu.w Su.w for all {u,w) G E. 
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W.r.t. commutative w-continuous semirings, the grammar G and the algebraic system are then 
connected by means of the commutative ambiguity cambc.x : Noo which assigns to each 

Parikh vector v G N*^ the number of leftmost derivations w.r.t. G with start symbol X leading 
to a word with Parikh vector v: we have that Sx = cambc^x for all X S A", or short s = cambc. 
See e.g. [US63llljoz99[|EKL07bj . 

Contribution and related work In this article, we study how to construct from a given 
context-free grammar G a sequence G^'^\G^^\... of nonexpansive context-free grammars 
that underapproximate the ambiguity of G (ambgii] ^ (w) < ambG_x(w) for all w G A*, Lemma 
13. 2p . and, thus, also the commutative ambiguity^ As GI*' is nonexpansive, it is straightforward 
to show that cambQiij x is rational in Noo((N'^)), and a rational expression representing cambQ[i] x 
can easily be computed from G^*^! (Lemma ??). We then give a lower bound on the speed at 
which cambgii] x converges to cambc: letting n be the number of variables of G, we show that 
for every positive integer k and every v € we have that, if cambQi^k] xi'^) cambG,x(i'), 
then at least k < cambQ[„fcj xi'") fTheorem l4.2p . 

An immediate consequence of these results is an algorithm for evaluating Datalog queries over 
"collapsed" commutative semirings: call a w-continuous semiring 5* collapsed at some positive 
integer k if in S the identity k — k + I holdsH given a valuation /i: A — > 5* into a commuta- 
tive w-continuous semiring collapsed at fc, the least solution can be obtained by evaluating the 
corresponding rational expressions for cambQink] under the homomorphism induced by h. 

In particular, this yields an algorithm for evaluating Datalog queries over the tropical semiring 
(Noo, min, +j 0, oo); this answers an open question of |GKT07| . We remark that in |EKL08| more 
efficient algorithms for the classes of star-distributive semirings, subsuming the tropical semiring, 
and of one-bounded semirings, subsuming finite distributive lattices, are presented. 

Finally, we show that cambcx can be represented modulo fc = fc + 1 as a finite sum 7ilci + 
. . . + 7rlcr of weighted characteristic functions Ic of linear sets C C N*^ with weights % S 
{0,1,..., fc} (Theorem [53)11 This completes the extension of Parikh's well-known theorem that 
the commutative image of a context-free grammar is a semilinear set (k = 1). 

These results continue the study of Newton's method over w-continuous semirings presented in 
[EKLOTbl lEKLOTal lEKLlO] . There it was shown that Newton's method, as known from calculus, 
also applies to the setting of algebraic systems over w-continuous semirings, and converges always 
to the least solution at least as fast as (and many times much faster than) the standard fixed- 
point iteration. Although it is shown in [EKLOTal lEKLlO] that Newton's method is well-defined 
on any w-continuous semiring, the definition does not yield an effective way of applying Newton's 
method as it requires the user to supply at each iteration a semiring element which represents 
a certain difference. Only for special cases it is stated how to compute those differences, but a 
general construction is missing in these articles. 

The grammars GI*^' defined in Definition 13.11 address this shortcoming. Their construction is 
based on the notion of "tree dimension" introduced in |EKL07b) to characterize the structure of 
terms evaluated by Newton's method, where it was shown that the fc-th Newton approximation of 

context-free grammar is nonexpansive if every variable X derives only sentential forms containing X at 
most once GS68 . 

''Where k denotes the term 1 + . . . -|- 1 consisting of the corresponding number of Is. For instance, any ui- 
continuous idempotent semiring is "collapsed" at 1. See also BE09 for a much more general discussion of these 
semirings. 

■*C C N'^ is Unear if C = {vq + T,t=i I Ai , . . . , £ N} for vectors iiq, . . . , £ N'^. 
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the least solution of an algebraic system corresponds exactly to the derivation trees of dimension 
at most k generated by the context-free grammar associated with the system. This allows 
us to explicitly define a grammar, resp. equation system, which captures exactly the update 
computed by Newton's method within a single step. That is, we may define the difference of two 
consecutive Newton approximations over any w-continuous semiring by constructing a grammar 
which generates exactly the derivation trees of G of dimension exactly k. By taking the sum of 
all these updates, we obtain the grammar, GI*^! which generates exactly the derivation trees of 
G of dimension at most k. Hence, if the least solution of (the equation system associated with) 
GC^-i] is known, we only need to solve the equation system corresponding to the derivation trees 
of dimension exactly k. We remark that this construction does not require multiplication to be 
commutative; it is merely a partition of the regular tree language of derivation trees of G. 

If multiplication is commutative, cambQ[fe] represents the fc-th Newton approximation over any 
commutative w-continuous semiring. Similarly, the bound on the speed at which camhQik] con- 
verges to cam be given in Theorem 14.21 generalizes the result of |EKL07b] on the convergence of 
Newton's method over idempotent commutative w-continuous semirings. 

If multiplication is not commutative, we may not represent the least solution of G^'^^ as regular ex- 
pressions, but only as regular tree expressions with the particular property that tree substitution 
only occurs at a unique leaf. It might be worthwhile to study if there are interesting (distributive) 
abstract interpretations whose widening operator can take advantage of this representation. 

Structure of the paper In Section[5]we recall the most fundamental definitions, in particular 
the definition of the dimension of a tree. We then show in Section [3] how to unfold a given 
context-free grammar G into a new context-free grammar G^^^ that generates exactly those 
derivation trees of G that are of dimension at most k and, thus, represents exactly the fc- 
th Newton approximation. We show that the commutative ambiguity of each grammar G^'^l is 
rational over Noo((N'^)). In SectionHwe give a lower bound on the speed at which the ambiguity of 
G'*^! converges to that of G. We use this result in Section[S]to obtain from a rational expression for 
cambQifc] a semilinear representation of cambo modulo the generalized idempotence assumption 
of fc = fc -|- 1, thereby completing the extension of Parikh's theorem from fc = 1 to arbitrary fc. 

All proofs can be found in the appendix. 

2 Preliminaries 

The power set of a set M is denoted by 2^. For fc G N, set [fc] := {1,2,..., fc} with [0] 0. The 
natural numbers extended by a greatest element oo, and the natural numbers "collapsed" at a 
given positive integer fc are denoted by Noo, and Nk = {0, 1, . . . , fc}, respectively. For a e Noo 
set a + oo = oo, ■ oo = and a ■ oo — oo if a ^ 0. Addition and multiplication are defined on 
Nfc by identifying fc with oo. 

The set of words over the (finite) alphabet A is denoted by A* with e = () the empty word. The 
length of a word w £ A* is denoted by \w\. The Parikh map is c: A* — >■ N'^ : w i— >■ (Ca(w) | a G A) 
where Ca{w) denotes the number of occurrences of a in w. 

Let E be finite ranked set (signature) where denotes the subset of E consisting of exactly 
those symbols having arity r. Then Ty, denotes the set of E-terms where we use Polish notation 
so that Ts C E*. When t 6 T^;, we denote by t — ati . . .tr that cr G E^ and ti, . . . ,tr & are 
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the uniquely determined subterms; for inductive definitions, we set t = ati . . .tr — a ii r = 0. Ts 
is canonically identified with the set of finite, E-labeled, rooted trees: the rooted tree underlying 
t = ati . . .tr has as nodes the set Vf = {e} U {iir | i e [r], n £ Vt-} with e the root, and the 
edges Et := {(7r,7ri) | iri G Vt} pointing away from the root. The label Iblt(-) of a node in Vt is 
then defined inductively by lblf(£) = a and \b\t{iTr) ~ Iblj. (tt) for t — ati ■ ■ - tr- The height hgt(t) 
of a tree t = ati ■ ■ - tr is defined to be if r = 0, and otherwise by hgt(t) = max^gj^j hgt(ti). 
Analogously, define the subtree t\.,^ oft rooted at tt, and the tree t[t' /-k] obtained by substituting 
the tree t' for tj^r inside of t. 

Definition 2.1. 

The dimension A\m{t) of t = ati ■ ■ - tr £ Tx; is defined to be dim (t) = if r = 0; otherwise let 
d = maxjg[r] dim(<i), and set dim(t) — d if there is a unique child i G [r] of dimension d, else set 
dim(t)=d+l. o 

From the definition it easily follows that dim(t) is the height of the greatest perfect binary tree 
that can be obtained from the rooted tree {Vt, Et) via edge contractions. Thus, d\m{t) is bounded 
from above by hgt(t). 

Example 2.2. 

Assume S = {a, b} with a G S2 and G Sq. Then aabbaabbb G Ts is identified with the tree 



e: a 




11:6 12:6 21: a 22:6 




211: 6 212: 6 

For instance, the node 212 is labeled by 6. Computing the dimension bottom-up, we obtain 
dim(t|2i) = 1, dim(t|2) = 1, dim(i|i) = 1, and dim(i) = 2. 

The tree dimension dim(t) is also known as Horton-Strahlcr number |Hor45[[5tr52| . or the register 
number |Ers581 IFFV791 IDK95j . and is closely related to the pathwidth |RS83j pw(T) of the tree 
T = (Vt,Et) underlying t: it can be shown that pw(T) - 1 < dim(i) < 2pw(T) + 1. 

Semirings We recall the basic results on semirings (see e.g. to [Kui97[ iDKOQj ). A semiring 
(S", +,-,0,1) consists of a commutative additive monoid (5, +,0) and a multiplicative monoid 
{S, •, 1) where multiplication distributes over addition from both left and right, and multiplication 
by always evaluates to 0. We simply write S for {S, +, •, 0, 1) if the signature is clear from the 
context. S is commutative if its multiplication is commutative. S is naturally ordered if the 
relation a C 6 defined by a C 6 :<;=> 3(iG5':a + (i = 6isa partial order on S; then is the least 
element. 

A partial order (P, <) is lu- continuous if for every monotonically increasing sequence (uj- chain) 
(ai)igNj i-G. Oi < Oi^i for all i G N, the supremum supjgpj Oi exists in (P, <); a function /: (P, < 
) (P, <) is called to-continuous if for every w-chain (ai)igN we have /(supjgp^ Oi) = supjgp^ f{cLi)- 
We say that S is lo- continuous if (5', C) is oj-continuous, and addition and multiplication are 
both w-continuous in every argument. In any w-continuous semiring finite summation ^ can 
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be extended to countable sequences and families by means of X^jgn'^* ^^PfceN '^i- The 

Kleene star * : S ^ S is defined by a* := X^ign 

If not stated otherwise, we always assume that Noo carries the semiring structure (Noo, +, •, 0, 1) 
with addition and multiplication as stated above so that 1* = oo. For any w-continuous semiring 
S there is exactly one w-continuous homomorphism h from Noo to S as h{0) — 0, = 1, 

and h{oo) = h{l*) = 1* have to hold; we therefore embed Nqo into S by means of this unique 
homomorphism. 

For a commutative semiring {S, +, ■, 0, 1), and a finitely decomposable monoid (M, o, e) we recall 
the definition of the semiring S{{M)) of formal power series. Its carrier is the set of total functions 
from M to S. For s £ S{{M)) denote by (s,m) the value of s at m S M. Then addition on 
S is extended pointwise to S{{M)), while multiplication is defined by means of the generalized 
Cauchy product, i.e.: 

(b + t, to) = (s, m) + (i, m) and (s • t, to) = ^ (s,u)-(t, w). 

u.v^M : uov—m 

That is, we treat s G S{{M)) as a (formal) power series X]meAf(^' "^)'^ with (s,to) the coefficient 
of the monomial to. If the support supp(s) = {to G M | (s,to) 7^ 0} is finite, then s is called 
a (formal) polynomial. The subset of polynomials is denoted by S{M). The semiring S and 
the monoid M are canonically embedded into S{{M)) by means of the monomorphisms hs'. S t-^ 
S{{M)) : s se and : M t-^ S{{M)) : m Ito, respectively. W.r.t. these definitions S{{M)) 
and S{AI) become semirings with neutral elements = hs(0) and 1 = hs{l) = /iA/(e); if S is w- 
continuous, then so is S{{M)), and the Kleene star is defined everywhere on S{{M)). For instance, 
S{{M)) is w-continuous for S either Nco or Nfe, and M either A* or N^; but N((A*)) and N((N^)) are 
not. Note that Noo ((A*)) is free in the following sense: let {S, +, •, Os, Is) be some cj-continuous 
semiring; then every valuation /i : A — > 5* extends uniquely to a w-continuous homomorphism 
h: Noo ((A*)) S defined by /i(s) = J2weA'(^^'^)^i'^)- Similarly, Noo((N'^)) is a representation of 
the free commutative w-continuous semiring generated by A, and, thus, isomorphic to Noo ((A*)) 
modulo commutativity. 

Let S be commutative and w-continuous so that the Kleene star is defined for every power series 
in S{{M)). A power series s G S{{M)) is called rational, if it can be constructed from the elements 
of S and M by means of the rational operations addition, multiplication, and Kleene star, i.e. 
if either r G S", or r G M, or r = (ri + r2), or r = ri ■ V2, or r = r^; for ri,r2 rational in S{{AI)). 
A rational expression (over M with weights in S) is any term constructed from elements of S 
and M, and the rational operations. For every rational series r in S{{M)) there is a rational 
expression p which evaluates to r over S{{M)). By our assumption that S is w-continuous, also 
every rational expression evaluates to a rational series r over S{{M)). Note that w-continuous 
homomorphisms preserve rationality. 

Context-free grammars A context-free grammar G — {X,A,P) consists of variables X, an 
alphabet A, and rules P C X x (AUX)* . By (G, X) we denote the grammar G with start symbol 
X G X. For a rule {X, 7) G P we also write X — >g 7 or simply X — )■ 7 if G is apparent from the 
context. =^>G denotes the binary relation on (A U X)* induced by the rules P, i.e., if X — s-g w, 
then aXf3 aw (5 for all a, /3 G (A U A")*. The (refiexive) transitive closure of =>g is denoted 
by The language generated by {G,X) is L{G,X) = {w G A* | X w}. 

^ A monoid (M, o, e) is finitely decomposable if for every m a M there exists only finitely many pairs (u, 1;) € 
that u o V = m. This ensures that the Cauchy product is also well-defined over semirings S which are not uj- 
continuous. 
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Let Tjq denote the set {o'x,7 | X -^g 7} and define the arity of ax.-y to be the number of variables 
occurring in 7. Define the new context-free grammar Gj with alphabet Sg by setting X — >Gt 
(^x.-iXi . . .Xr for 7 = 7oXi7i . . . ^r-iXr"fr- Then Tg,x '■= L{Gt, X) C Ts^ is called the set of 
(G, X)-trees (or simply X-trees if G is apparent from the context) and Tg,x "yields" L{G, X) in 
the sense of [Tha67[ IBR82[ IBoz99[ IEKEUtE) : The word represented by a' tree t e is called 
its yield Y(/:) and is inductively defined by Y{t) = uoY{ti)ui . . . Ur-iY{tr)ur for t = ax,-yti . . .tr 
and 7 = uqXxUi . . . Ur-iXrUr- We then have L{G, X) ~ {Y(t) | t e Tg,x}, and 

ambG,x(w) = \{t e Tg,x | Y(t) = w}\ and cambG,x{v) = \{t £ Tg,x \ c{Y{t)) = v}\ . 

where amb^x £ Noo((A*)), camb^x e Noop'^)) and L(G,X) = supp(ambG,x) e Ni((A*)). 
The dimension of a derivation tree is closely related to the index of a derivation. 
Definition 2.3 (see e.g. |GS68) ). 

The index of a derivation is the maximal number of variables occurring in any sentential form of 
the derivation. o 

Definition 2.4. 

For G a context-free grammar and t £ T^^, let minidx(i) be the minimum index taken over all 
derivations associated with t. o 

Lemma 2.5 f jEKLOTal IECKLII] ). 

Let G be a context-free grammar and rmax the maximal arity of a symbol in Eg- Then: dim(t) < 
minidx(<) < dim(i) • (r^ax — 1) + 1. o 

Example 2.6. 

Consider G defined by the productions: 

X YaYaY Y ^ X Y b. 
Then Eg = {cx.xxx, crx,y, cry.a}- The leftmost derivation 

X YaYaY ^ XaYaY ^ YaYaYaYaY ^+ babababab 
has index 5, and corresponds to the derivation tree 

t — Ox, YaYaY CTy.X (^X^YaYaY <^Y.h 0'Y,b CTyM CrY,b crY,b 

depicted as 

^X.YaYaY 

1: aY,x 2: aY,b 3: aY,b 
1 

11: (^X,YaYaY 

y i X 

111: OYf, 112: <JY,b 113: OY,b 

This tree has dimension 1. A derivation of minimal index first processes the subtree t\i and tja 
leading to an index of 3. 
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3 Unfolding 



In this section, we describe how to unfold a given context-free grammar G — {X, A, P) into 
a new context-free grammar G^'^^ which generates exactly the trees of dimension at most k 
(Definition 13.11 and Lemma l3^ . Hence, ambgifej < ambc- By construction, G^*''! is nonexpansive, 
i.e. every variable X can only be derived into sentential forms in which X occurs at most once 
|GS68[|Ynt67) . From this, it easily follows that the commutative ambiguity camhQ[k] is a rational 
power series in Noo{{^^)) (Lemma ??). 

We first give an informal description of the notation used in the definiton of GI*^' : given the 
bound k on the maximal dimension we split every variable X d X of G into the variables X^"^"^ 
and where d G {0, 1, . . . , k}, with the intended meaning that Xt'^) resp. generates all 
Gjf-trees of dimension exactly resp. at most d; a variable Xl'^l can only be rewritten to X^"^ ^ for 
some d' < d, i.e. nondeterministically the dimension of the tree to be generated from has to 
be chosen; the rules rewriting the variable X^'^'^ are derived from the rules X — 7 by replacing 
each variable Y occurring in 7 by either Y^'^ ^ or Y^'^^ for some d' < d in such a way that, 
inductively, it is guaranteed that every X-tree of dimension exactly d is generated exactly once. 
In particular, as for each X-tree t = ati . . .t^ there is at most one i G [r] with dim(t) ~ dim(ii), 
the grammar GI*''! is nonexpansive. 

Definition 3.1. 

Let G be a context-free grammar G ~ {X,A,P), and let fc be a fixed natural number. Set 
AfW := \ X € X,0 < d < k}. The grammar 

Q[k] ^ (^W,A,PW) consists then of 

exactly the following rules: 

• X^''^ for every de[k]U {0}, and every e G [d] U {0}. 

• U X -^c Mo, then X^°^ -^Gifcj uq. 

• If X uqXiUi, then Xf'') -^qi^] uqX['^\j.i for every d G [fc] U {0}. 

• If X — >-G uqXiUi . . . Ur^iXrUr with r > 1: 

— For every d G [fc] , and every j G [r] : 

Set Zj := ^ and Z, := if i ^ j for aU i G [r] - {j}. Then: 

— For every d G [k], and every J C [r] with \J\ > 2: 

Set Z, := if i G J and Z, := xf^"^^ \i i ^ J . If all Z, are defined, i.e., d > 2 if 

r > 2, then: 

X^'^^ -^Q[h] UqZqUi Ur-lZr-lUr- 

O 

As the sets of variables of G and G''^' are disjoint, in the following, we simply write ambx for 
ambG,A', amb_^[d] for ambgifc] j^[d], X-tree for (G, X)-tree, and so on. 

Lemma 3.2. 

Every X^'')-tree resp. X[''l-tree has dimension exactly resp. at most d. There is a yield-preserving 
bijection between the X^'^^-trees resp. Xl'^l-trees and the X-trees of dimension exactly resp. at 
most d. 
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Corollary 3.3. 

amhx[k] {w) = \{t G Tg,x \ Y(t) = u; A dim(t) < k}\ for all X G X. o 
Theorem 3.4. 

Let G = {X, A, P) be a context-free grammar. 

1. cambjf[fe] is rational in Noo(( A®)). 

2. There is a fc e N such that smbxm = ambx for all X £ X if and only if G is nonexpansive. 
Further if such a k exists, then k < \X\. Analogously, for cambx[fe] = cambx- o 

Proof. The first claim that camb^ifc] is expressible by a weighted rational expression follows 
directly from the structure of the unfolding of G^'^l . With G^'^l we associate an algebraic system 
over Noo ((N'^)) defined by the equations X = J2x^-y 7- ^he least solution of this system is exactly 
camb. For fc = we have only rules which contain at most one variable on the right-hand side. 
So, the associated algebraic system is linear, in particular right-linear because of commutativity 
and, thus, the least solution is expressible by means of a rational expression. For k > 0, solving 
the associated algebraic system bottom up, we have already determined rational expressions for 
the variables of the form X^'^^ and X^''^ for d < k. By the structure of unfolding, the system is 
again right-linear w.r.t. to the remaining variables X^'^^ and X'*^^. So the claim follows. 

For the second claim, assume first that G is expansive. Then there is a derivation of the form 
Y wqYwiYw2 for some Y G X. Obviously, we can use this derivation to construct Y- 
trees of arbitrary dimension. Hence, cambyifcj < cam by for all fc G N. Assume now that G is 
nonexpansive. The definition of "nonexpansive" can be restated as: In any X-tree t = atit2 ■ ■ - tr, 
at most one child contains a node which is labeled by a rule rewriting X. Let l{t) be number of 
distinct variables Y for which there is at least one node of t which is labeled by a rule rewriting 
Y. Obviously, l{t) < \X\. Induction on l{t) shows that every derivation tree t satisfying this 
property has dimension less than l{t): For l{t) = 1 a. tree with this property cannot contain any 
nodes of arity two or more. Hence, its dimension is trivially zero. For l{t) > 1 given such an 
X-tree t = ati . . .tr we can find a simple path tt leading from the root of t to a leaf which visits 
all nodes of t which are labeled by a rule rewriting X. Removing tt from t we obtain a forest of 
subtrees each labeled by at most /(t) — 1 distinct variables, and each still having above property. 
Hence, by induction each of these subtrees has dimension less than l{t) — 1, and, thus, t has 
dimension less than l{t). □ 

We illustrate the construction in the following example. 
Example 3.5. 

Let G be defined by the productions 

X aXXXXXX I bXXXXX \ c. 

The abstract algebraic system associated with this grammar is 

X = aX^ + bX^ + c. 

Using the valuation h{a) = 1/6, h{b) = 1/2, h{c) = 1/3, we interpret this abstract system as the 
concrete system 

X = 1/6X^ + 1/2X^ + 1/3 

over the w-continuous semiring ([0, oo], -|-, •, 0, 1) of nonnegative reals extended by a greatest 
element oo with addition and multiplication extended as in the case of Nqo- The least solution fj, 
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of this system, i.e. the least nonnegative root of 1/6X^ + 1/2X^ — X + 1/3, can be shown to be 
neither rational nor expressible using radicals. We may approximate fi by evaluating cambjf[fc] 
under h. Up to commutativity, the grammar G^'^'l corresponds to the following algebraic system: 

= c = c 



From this, rational expressions for camb^[fc] can easily be obtained: 
camb^(o) = c camb^io] = c 

cambx(i) = {6ac^ + 5bc'^)* {ac^ + bc^) camb^[i] = cambx(i) + camb^pi 

cambjf(fc) ^(^)acamb|.[fc_ij +(^)6camb^[fc_i]^ camb^ifc] = cambx(A) + camb^[fc_i] 

+ Ej=2 Qacamb^7f_2i camb^^(,_i, 
+ Ei=2 (i)^camb^7_,j camb^(,_i) . 

Evaluating the first three expressions for camhxik] under h we obtain the following approxima- 
tions of fi: 

h{cambQ[k] x[o]) = 1/3 

/i(cambGW,x[ii) = 1/3 + (6-13-^ + 2-i3-5)(i - 6 ■ 6-13-^ - 5 • 2-13-^^)-^ 
= iir ~ 0-335702 

u/r-^r^l^ \ — 10981709605561545700033 ^ n QQc;7n/l 

/l(_CamDG[fcj _x[21 J - 32712506178044757018129 ~ U.ddO ^4 

It can be shown that /i(camb^[fc] ) is exactly the k-th approximation obtained by applying New- 
ton's method to 1/6X*^ + 1/2X'^ - X + 1/3 starting at X = 0. o 



4 Speed of Convergence 

For this section, let n denote the number of variables of the context-free grammar G. In [EKLOTb] 
it was shown that, if cambjsf [r.] < cambx('w), then 1 < cambx["] (f), i.e. supp(cambj5f [nfci ) = 
supp(cambx). As cambx[„] is rational, this lower bound yields an alternative proof that c{L{G, X)) 
is a regular language. In this section we extend this result to a lower bound on the speed at 
which cambj5f[fc] converges to cambx for fc — oo: 

By l{t) we denote the number of variables occuring in a derivation tree t. The following lemma 
was proven in jEKLOTb] . 

Lemma 4.1. 

For every X-tree t there is a Parikh-equivalent tree t of dimension at most l{t). 
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By similar arguments as before we can derive an even stronger convergence-theorem: 
Theorem 4.2. 

Let n be the number of variables of G. Then for all A; > and v G N'^: cambgin+fc] (v) > 
min(cambx('u), 2^''). O 

Proof. Assume there is a v e N'^ with camhxi7i+k]{v) < camhx{v), i.e. we have some X-tree t 
of dimension at least n + k + 1 with c(Y(i)) = v. We show that t witnesses the existence of at 
least 2^ distinct X-trees of dimension at most n + k with a yield that is Parikh-equivalent to t. 

We will prove the following stronger statement which implies the statement of the theorem: If 
d\m(t) > l{t) + k + 1 then there exist at least 2^ Parikh-equivalent trees of dimension at most 
l{t) + k. 

We prove the claim by induction on |V^(t)|, the number of nodes of t. If |V"(<)| = 1, then dim(<) = 
whereas l{t) -|-fc-|-l = fc + 2>0, so the claim trivially holds. Observe that if t has a subtree 
of dimension at least l{t) + k + 1 we can apply the induction hypothesis to every such subtree 
and thus obtain altogether at least 2^ Parikh-equivalent trees of dimension lower than d\m{t). 
Therefore we can restrict ourselves to the case where dim(i) = l(t) + k + 1 and all subtrees have 
dimension at most l{t) + k. Note that in this case t must have (at least) two subtrees ^1,^2 of 
dimension exactly l{t) + k. We distinguish two cases: 

• Case l{ti) < l{t) or l{t2) < l{t): Suppose w.l.o.g. l{ti) < l{t). Apply the induction 
hypothesis to ii, since dim(ii) = l{t) + k > l{ti) -I- fc -I- 1 and obtain at least 2^*° Parikh- 
equivalent trees of dimension at most l{ti) + k. Then we apply Lemma |4. II to every other 
subtree of t to obtain at least 2^ different trees t of dimension at most l{t) + k. 

• Case l{ti) — l{t2) ~ l{t): (This is the only case that requires actual work) Since ti has 
dimension l{t) + k it contains a perfect binary tree of height l{t) + k as a minor. The set 
of nodes of this minor on level k define 2'' (independent) subtrees of ti. Each of these 2^^ 
subtrees has height at least l{t), thus by the Pigeonhole principle contains a path with two 
variables repeating. We reallocate any subset of these 2'^ pump-trees to ^2 which is possible 
since ^(^2) = l{t) = l{ti). This changes the subtrees ^1,^2 into ^1,^2- Each of these 2^ 
choices produces a different tree t — the trees differ in the subtree ti . As in the previous case 
we now apply Lemma l4. II to every subtree of t except ti thereby reducing the dimension of 
t to at most dim(ti) = l{t) + k thus obtaining at least 2^ different Parikh-equivalent trees 
of dimension at most dim(ti) — l{t) + k. 

□ 

We state some straightforward consequences of Theorem 14.21 based on the generalization of 
context-free grammars to algebraic systems. We say that a w-continuous semirng S is collapsed 
at some positive integer A: if in the identity k — k + 1 holds. For instance, the semirings 
Nfc((A*)) and Nfe((N'^)) are collapsed at k. For fc = 1, the semiring is idempotent. 

Corollary 4.3. 

cambjf[„+iogiogfc] = cambx over Nfc((N'^)), and cambjc is rational in Nfc((N'^)). 
Corollary 4.4. 

The least solution of an algebraic system with associated context-free grammar G and valuation 
h over a commutative w-continuous semiring S collapsed at k is (/i(cambj^[„fe] ) | X € X). 
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By the results of |EKL07b] . the latter corollary is equivalent to saying that Newton's method 
reaches the least solution of an algebraic system in n variables over a commutative w-continuous 
semiring collapsed at k after at most n + log log /c steps. 

5 Semilinearity 

In the following, let k denote a fixed positive integer. By Corollarv 14.31 we know that cambc is 
rational modulo k — k + 1. In this section, we give a semilinear characterization also of cam be. 
We identify in the following a word w G A* with its Parikh vector c{w) G N'^. 

In the idempotent setting (fc = 1), see e.g. |Pil731 iKSMl IHK991 HEIOl] . the identities (i) (x*)* = 
cc*, (ii) (x + y)* = x*y*, and (iii) (xy*)* = 1 + xx*y* can be used to transform any regular 
expression into a regular expression in "semilinear normal form" '^l^iWi^ow* ^ . . .w* i with 
Wi.j G A*. It is not hard to deduce the following identities over Nk{{N^)) where x^^ abbreviates 
the sum X)i=o supp(a;) is identified with its characteristic function: 

Lemma 5.1. 

The following identities hold over Nfc((N'^)): 



(11) 


kx 


— fcsupp(x) 






(12) 


(jx)* 


= (7x)<ri°ST 






(13) 


(x*)* 


= kx* 






(14) 


{x + yy 


= (.T + y)<^- 


+ x^x* + y^y* + kxy(x 4 


_ y'jmax(fe-2,0)2,*y* 


(15) 


{xy*r 


= l + xy* + 


x'^x* + a;^yX]o<mj<fc-2 


{'\";p)x"^y^ 



+ fcx2y(a;'°'''^('="2,0) _^ ymax(/c-2,0))2,*y* 



for 7 any integer greater than one. o 

Consider a rational series r G Nfc((N'^)) represented by the rational expression p. The above 
identities, where (13), (14), (15) generalizes (i), (ii), (iii), respectively, allow one to reduce the star 
height of p to at most one by distributing the Kleene stars over sums (pi + P2)* and products 
{P1P2)* ~ hi the latter case if pip2 ^ A* - yielding a rational expression p' of the form 

p' = X! ^i'^^fiwl^i ■ . . w*i^ (wij G A*, 7i G Nfc). 

which still represents r over Nfc((N'^}). By (II) we know that, if ji^ = k, we may replace 
WifiW*i . . .w*i, by its support which is a linear set in N'^. This can be generalized to /c > 1: 

Theorem 5.2. 

Every rational r G Nfc((N'^)) can be represented as a finite sum of weighted linear sets, i.e. 

r = ^ 7i supp(wi_ow*i . . . w*;) with wt^j G A* and % G Nk- 

ie[s] 

Example 5.3. 

The rational expression p = (a + 26)* represents the series J^i jeN 2-'fl*fr' in Noo{{^^))- Computing 
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over N2{{N^)) we may transform p as follows: 



(a + 26)* (14) 

= (a + 26)<2 + a2a* + (26)2(26)* + 2a(26)a*(26)* (II) 

= £ + a + 26 + a^a* + 26^6* + 2a6a*6* (a^* = E»gn H) 

= a* + 2(66* +a6a*6*) (x* = J2,efi x\ \1) 

= a* + 2{bb*a*) (II) 

= a* +2 supp(66*a*) (a* = E^eN 
= 1 supp(a*) + 2 supp(66*a*). 

Corollary 5.4. 

For every k S No© we can construct a formula of Presburger arithmetic that represents the set 
{t) € N'^ I cambG,x(t') = k}. 
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A Missing proofs 



Proof of Lemma 13.21 

Let t be a derivation tree of dimension dim(i) = d. Then t — ati . . .tr has at most one child 
tc (c G [r]) with d\m{t) = d\m{tc) by definition of dim. Hence, there is a unique maximal path 
vq ■ ■ - Vi starting in vi — e such that (i) d\m{t) = dim(t|„J and (ii) either vi is a leaf of t or every 
proper subtree of vi has dimension less than d. Let dlen(i) = / denote the length of this unique 
path. Further, we use dchar(t) — {{i,d\m{t'^)) \ i G [r'] for ~ a't[ . . .t'^,} to remember the 
dimensions of the children of . (dchar(/;) = if is a leaf of t.) 

We first construct a mapping • from the derivation trees of G^^^ to the derivation trees of G of 
dimension at most d and exactly d, respectively: 

• If i = (T^[d] then t :— ti. 

• If t = <Jxw,uoZim...u,.-,z,.uM ■■■tr, then t := ax,«oXi«i...«,_iX,«,ii ■■■U where Xi e X is 
the variable from which Zi G A'^'^l was derived. 

Informally, ' contracts edges induced by rules X^'^'^ — X^'^'' which choose a concrete dimension 
e < d, and then forgets the superscripts. By definition, the rules of G^'^l which rewrite the variable 
are obtained from the rules of G which rewrite the variable X by only adding superscripts. 
Hence, ^ maps any X[''l-tree and any X^'^^-tree to a X-tree while preserving its yield (Y(t) = Y(f)). 
Further, as the edges induced by the rules X^'^'^ X'-'^^ do not influence the tree dimension, we 
also have dim(i) = dim(t) and dchar(t) = dchar(t). We also have dlen(t) > dlen(i) as contracting 
the edges induced by X^'^^ — )• X'-'^^ can only reduce dlen(-). 

We claim that " maps the set of X^'^l -trees (X'-'^^) one-to-one onto the set of X-trees of dimension 
at most d (exactly d). We proceed by induction on d. Let d = 0. 

• Consider a X^^^-tree t. The only rules rewriting X^^^ are of the form X^'^^ — > u or 
uY^^^v (for u,v G A* and Y £ X). For these rules, forgetting the superscript is an injective 
operation. Hence, ~ is injective on the set of X^^^-trees. Obviously, t is also a chain, and, 
thus, = dim(t) = dim(<). (In fact, dlen(<) = dlen(t).) 

Consider now a X'^'^-'-trec t. By definition of G^*^!, can only be rewritten to X^^\ So 
t — CTxio] fo'' ^1 ^ ^'■'^■'-tree, and i = ii. Again, = dim(<) = dim(t). 

• Let f be a X^'^'-tree for d > where t = crx(d) ^^^Ziui...u,.-iZ,.Ur'ti . . .t,. for some r > where 
there is a rule X uqXiUi . . . Ur-iXrUr in G (Xi G X, Ui G A*) such that for all i G [r] 
either G {xf'.xf-'^} or G {xf-'^.xf-^}. 

Assume first that t has no yt''l-subtree for any Y £ X , i.e. i is a AT'^'^^-tree of minimal 
height. Then a ~ (Tx(d) uaZi Zru^ where Zi — X^'^ or, if d > 2, = xj'^"^' for some 
Xi G X such that X uqXi . . . XrUr in G. Inductively, we already know that dim(i') = e 
(dim(i') < e) for every X^'^^-tree (Xl'^l-tree) and all e < d. Hence, dim(i) = d\m{d) = d and 
dlen(t) = dlen(t) = 0. 

Thus, assume that t contains a y^'^^-subtree for some Y £ X. By construction, there 
occurs at most one "((i)-variable" , i.e. a variable of {F^'*) | Y G X}, in the right-hand 
side 7 of every rule X^*^^ 7. By construction, there is a unique j G [r] such that 
= X^/\ while Z, = Arf^^i for aU i G [r] - {j}. Then the xj'^^-tree tj has height less 
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than t, so by induction on the height of X'^'^-trees, we have dim(tj) — dim(tj) = d and 
dlen(tj) — dlen(ij). By induction on d, we also know that dim(ti) < d {i £ [t] — {j})- 
Hence, dim(t) = d and dlen(t) = dlen(tj) + 1. As the edge to tj is not contracted by also 
dlen(t) = dlen(tj) + 1 = dlen(<j) + 1 = dlen(i). 

Assume now that i — i' for two X^'^^-trees t,t'. Then dim(t) = dim(t') — dim(f), dlen(i) — 
dlen(t') = dim(t), and dchar(i) = dchar(t') = dchar(£). Let i = (Jx.uoXiui...Ur-iXrUr- Then 
necessarily, t = crx(d) ^uoZi...z,.u^ and t' = cr^(d),„„z;...z;n, with either Zi e {xf'^^xf"^'} 
or Zi G {X^'^ ^^x]'' ^'}, and, analogously, for all Z'^. as ' only forgets superscripts and 
removes <JxM,x('') ■ 

If dlen(i) = 0, then t,t',i have only subtrees of dimension at most d — 1. By definition of 
GW, it follows that Zi,Zl e {xf "^1}. By induction, we know that only (d - 1)- 

variables can generate trees of dimension d — 1, hence, necessarily Zi = Z[ = xf ^■^ for 
all children i e [r] of i which have dimension exactly c? — 1 , while Zi = Z[ = xf ^' for all 
remaining children. Again by induction, we know that is injective on sets of yl''^'^l-trees 
and y'^'^^^^-trees, respectively. Hence, t = t' . 

Finally, assume dlen(t) > 0. Then t has a unique child t\j of dimension d, while dim(i|i) < d 
for j e [r] - {i}. Consequently, Zj =2'^= xf^ and Zj = Z] = X^^^^^ for j G [r] - {i} by 
definition of G^^^ . By induction on d and dlen(<), we may assume that is injective on the 
subtrees of t and i', hence, t = t' follows. 

It remains to show that for any X-tree t' of dimension exactly d (at most d), there is a AT'^'^^-tree 
(Xl'^l-tree) t such that i = t' . To this end, we define an operator " which maps a X-tree of 
dimension exactly d to a X'^^'^-tree by, essentially, introducing the superscripts into a symbol 
'^x.uaXx...XT.Ur as required by the dimensions of the subtrees ti, . . . 

Let t = (Jx,uoXiui...x,.Urti ■ ■ - tr with d = dim(<) and di = dim(ii), then 

i := ly xW ,XW^ XW ,uoZiut...ZrU^'^'l ■ ■ - t'r- 

where Zi, t'i are defined as follows: 

• If d > maxjg[r] di, then let J = {z G [r] \ di = d — 1} and set Zi := xj^"^ and 
t'^ :— ii \i i £ J, and Zi := X^f ^' and t'^ := (Tj^[d-2] x^''i'>^i otherwise. 

• If d = maxjgjr] di, then there is a unique j G [r] such that dj — d. Set Zj = xj"^-* 
and t'j :— tj. For the remaining i G [r] — {j}, set Zi :— X^^ ^' and t[ := 

It is straightforward to check that i is indeed a X^'^^-tree for dim(i) = d, and that i — t. Obviously, 
" is injective. Finally, for every d' > d there is exactly one rule AT^'' 1 — > A*^"^'. Hence, (Txid'] xwt 
is, by definition of G^'^l, the unique X^'^ '-tree which is mapped by ^ back to t. 

Proof of Lemma 15.11 

The proofs are straightforward, and essentially only require to unroll and cut off the power series 
underlying the Kleene star using the w-continuity of the Kleene star and the assumption that 
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k = k + 1. We several times make use of the trivial bound (^) > a for < 6 < a. on the binomial 
coefficient. 

(11) kx = fcsupp(a;) is obviously true modulo k = k + 1. 

(12) (7a;)* = (7a;)<r'°S7 '=1 + fc • x^^"^-^ '''^x* 

This follows from the w-continuity of the star (7a;)* = X^„gN(7a;)" and the first identity. 

(13) {x*y = kx* 

Choose any w € supp((x*)*). Then w can be factorized into w = u\...ui with m e 
supp(a;*), i.e., w G supp((.T*)'). Obviously, wc then can also find a factorization of w; into 
I + i words for any i > as we may add an arbitrary number of neutral elements e into 
this factorization. Hence, w G supp((a;*)'+') for all i > 0. So, the coefficient of w in (x*)* 
is 00 = fc modulo k = k + 1. 

(14) {x + y)* = {x + y)<'= + x'^x* + y'^y* + kxy{x + y)^^{k-mx*y* 
Proof: 

{x + yY 
= (a; + y)<*= + E„>fc(^ + y)" 
{xy = yx) = (a; + y)<^ + E„>feE;=o(-)^V-^' 

= {x + yr" + En>k ^" + y" + (P^'y""') 

= ix + y)<^ + x^x* + yS* + E„>fc E;=i 
(j = i + l,n = m + 2) = {x + y)^^ + x^x* + y^y* 

I V V™ /'™+2\^i+l„,m-i+l 

^ Z^m>max(fe-2,0) Z^i=0 V i+1 » 

((7+1') > fc, (II)) = {x + y)<^+x^x*+yS^ 

+ Em>max(fc-2,0) Ei=0 (7)^*2/™ * 

= {x + y)^^ + x^x* + y^y* 

+ kxy{x + y)max(fc-2,0) ^ y)* 

((ii) supp((a; + t/)*) = supp(a;*y*),(ll)) = {x + y)<^ + x'^x* + y'^y* 

+ kxy{x + y)"'^'''^''-'^'°^x*y* 



(15) 

^0<mj<fc-2 V 1+j 

+ kx'^yx""'''^''-'^'°'^x*y* + kx'^yx*y""''''-''-'^'°^y 



{xy*r = l + a^r+^'^* + ^'2/Eo<™.,<fc-2ri7,^'>V 
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Proof: 



{xy* = y*x) = 

+ 
+ 

(n = m + 2, Z = + 1, xy = yx) = 

+ 

+ 

+ 

(fc = fc + l) 

+ 
+ 

+ 
+ 
+ 



l + xy* 

1 + xy* + x^x* 

z^n>2j>i \ I y 

1 + xy* + X^X* 
1 + xy* + x^x* 

m>fc-2Vi>fc-2 

i/ Z^o<mj<fc-2 V 1+i /■^ y 
1 + xy* + x^x* 
kx'^yJ2 rn.j>a a;™y^ 

m>fc-2Vj>fc-2 
^2 (2+rn+]\ m j 

1 + xy* + X^X* 
^2,2y2;max(fc-2,0)2,*y* 

fcx2yx*y'°''''('=-2'0)y* 

y z^o<mj<fe-2 V 1+j y-^ y 



Proof of Theorem 15.21 

We identify a word w e A* with its Parikh vector c{w) £ N'^. We show that, if supp(w^ . . . Wi ) ^ 
w\ . . . Wi in Nfc((N''^)), then we can spht the hnear term in a finite sum of weighted hnear terms 
where in each hnear term with weight less than k the number of Kleene stars is strictly less than 
I. Then the result follows inductively. 

W.l.o.g. we may assume that each Wi ^ e, i.e. c{wi) ^ 0, as e* = oo = fc. Denote by M £ 
N'^^' the matrix whose i-th row is given by c{wi) (w.r.t. some chosen order on A), and let 
A = (Ai, . . . , Ai) G N'. Then the coefficient c„ := {wl . . .Wi,v) is exactly the number of solutions 
over N' of the linear equation v = XM. If the set {0(1111) , c{w2) , ■ ■ ■ , is linearly independent, 

then trivially < 1 and we are done. 

Assume thus that the set {c{wi), 0(^12), ■ ■ ■ , c(wi)} is linearly dependent, i.e. there is some kernel 
vector n — (ni, . . . ,ni) G Z' \ {0}. Let = [I] \ rn > 0}, I- = {i & [I] \ rn < 0}, and 
/o = {* G [I] I "-i = 0}. As all components of M are nonnegative, n necessarily has a positive 
and a negative component, i.e. /+ 7^ 7^ Let \\n\\^ := max^gj;] \ni\ and C := \\n\\^ ■ (k — 1). 

Consider now any A = (Ai, . . . , A() G N' with A.; > C for all i ^ I+. Then also A — in e N' for 
i = 0, . . . , fc — 1 and trivially v = XM = (A — in)M which implies that >k. If Ai > C for all 
i e I-, consider analogously A + in. For / G {/+,/_} we split the series Hie/ series S/ 
and t/ defined by 

Bi-.^Yliwfw*) and tz:- ^ l[w<^ [] 
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As discussed above, all positive coefficients of s = Y\i(=i{'wf'w*) (for / G {/+,/_}) are greater 
than or equal to k. Hence S/ — fcsupp(s/) over N/c((N'^}). 



■ ■ ■ '^l 



ielo 



= Y[ + U)iksi_ + ti_) 



tela 



ielo 



= n < ^i+^i- + n < + ^^i- n < 

ielo \ \iei+ iei- J iei+ui- 

\iei+ iei- / ie[i] ieio 

It remains to consider the second summand which can be written as a finite sum of products of 
which each contains at most |[^] — (J+ U J_)| < I — 2 Kleene stars: 



E n <^ n n 

ielo <l)^j+ci+ ieJ+uJ- ie(/+-J+)u(/_-J_) ie[i]-{J+uJ-) 



w., . 



Proof of Corollary 15.41 

As c{L{G,X)) = supp(cambG,x) = {f £ N'^ | cambc^x{v) > 0} is semilinear by Parikh's 
theorem, it is effectively representable by a formula of Presburger arithmetic, and so is its 
complement (fc = 0). 

Assume thus 1 < fc < oo and let ii' = fc + 1. Then we may compute from camb^[„K] a weighted 
semilinear representation of cambx modulo K = K + I: 

r 

cambx = E supp(Di,oi'z*i ■ • ■ with 7^ £ Nk and Vi^ e N'^. 

From each term supp(t;,;.o'U*i • ■ • f*;.) we can construct an equivalent Presburger formula Fi. 
Then cambx('u) = fc if and only if 

r I 

V h 3yi,...,yr: "^-ftyt = fc A /\{F,{v) y, = \A^F,{v) y^ ^ 0). 

i=l i=l 

Finally, let fc = 00. As for any v e there are only finitely many it; e A* with c{w) = t), we 
have cambG^x(i') = 00 if and only if there is a w e A* with c(w) = v and ambG^x(w) — 00. We 
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therefore construct from G = {X,A,P) a context-free grammar G' = {X',A,P') with X C X' 
such that L{G',X) = {w G A* \ ambG,x(w) = cjo}. Then {u e | cambG,x(w) = 00} = 
c{L{G' , X)) and is a scmiUncar set by Parikh's theorem where the corresponding Presburger 

formula is again effectively constructiblc. 

We discuss the construction of G' for the sake of completeness: we have arr\bG,x{w) = 00 if and 
only if there are infinitely many X-trees t with Y(t) = w. In particular, for every /i e N we can 
find a X-trcc t of height at least h with Y(t), as there arc only finitely many X-trccs of bounded 
height. For instance, choose h > {\w\ + 1) \X\ and consider a maximal path vq. . .Vh from the 
root of such a t to a leaf. For all i = . . .h assume t\y^ is a Xj-tree {X = Xq). This path then 
corresponds to a derivation of the form 

X = Xo UqXiVo . . . Uq... Uh-lXhVh-1 ...Vo^Ui... Uh-lUhVhVh-1 ...Vi=W 

for suitable Ui,Vi G A*. In the sequence Xo,Xi, . . . ,Xh color Xi black if \uiVi\ — 0; otherwise 
color Xi red. Then there are at most |«;| red variables in this sequence. In particular, there is a 
subsequence Xi, X^+i, . . . , Xj_|_|;t'| consisting of 1 + \X\ consecutive black variables, as otherwise 
h + 1 < {\w\ + 1) \X\. Hence, the derivation contains a cyclic derivation Y =>+ Y. 

Therefore compute the set Xc = {X X \ X X} oi cyclic variables as usual, and define 
G' such that a derivation can only terminate in a word if the derivation visits at least one cyclic 

variable: 

• Set X' = {X, X' \ X G X} with the intended meaning that an unprimed variable still has 
to be derived into a sentential form containing at least one cyclic variable Y e Xc- 

• Construct P' as follows: 

- If X ->-G uo for uq e A*, then X' ->-g' uq. 

— U X —^G U0X1U1X2U2 ■ ■ ■ Ur-iXrUr for r > and Ui e A*, then 

X' — >-G' UoX[uiX2U2 ■ ■ ■ Ur-lX'^Ur 

and 

X -^G' U0X1U1X2U2 ■ . ■ Ur-lX^Ur 
X —^G' UoX[uiX2U2 ■ ■ ■ Ur-lX^Ur 

X -^G' UqX[uiXI2U2 ■ ■ ■ Ur-lXrUr 

-liX eXc, then X -^g' X'. 

By construction, an unprimed variable Y can only be rewritten to a sentential form containing 
exactly one unprimed variable, except Y is cylic in G, in which case the rule Y -^g' Y' can also 
be applied. 

Then w G L{G',X) if and only if there is a derivation X =^q, uYv =>g' uY'v =>^, w, as 
only primed variables can be rewritten to terminal words. By construction, this is equivalent to 
X =>g uYv w and Y e Xc, which in turn is equivalent to ambG,x{w) = oo. 
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